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Abstract 

^ We study the compressed sensing reconstruction problem for a broad class of random, band- 

Q diagonal sensing matrices. This construction is inspired by the idea of spatial coupling in coding 
theory. As demonstrated heuristically and numerically by Krzakala et al. [KMS"*"!!] . message 
passing algorithms can effectively solve the reconstruction problem for spatially coupled measure- 
ments with undersampling rates close to the fraction of non-zero coordinates. 

We use an approximate message passing (AMP) algorithm and analyze it through the state 
evolution method. We give a rigorous proof that this approach is successful as soon as the 
c/2 undersampling rate 5 exceeds the (upper) Renyi information dimension of the signal, d(px)- More 

, ^ , precisely, for a sequence of signals of diverging dimension n whose empirical distribution converges 

to px, reconstruction is with high probability successful from d{px) n + o(n) measurements taken 
T-H according to a band diagonal matrix. 

^ For sparse signals, i.e. sequences of dimension n and k(n) non-zero entries, this implies 

reconstruction from k{n)+o{n) measurements. For 'discrete' signals, i.e. signals whose coordinates 
take a fixed finite set of values, this implies reconstruction from o(n) measurements. The result 
(-^ is robust with respect to noise, does not apply uniquely to random signals, but requires the 

knowledge of the empirical distribution of the signal px ■ 

I 1 Introduction and main results 
1.1 Background and contributions 

X 

Assume that m linear measurements are taken of an unknown n-dimensional signal x G M" , according 
to the model 



OO 
O 



y = Ax. (1) 

The reconstruction problem requires to reconstruct x from the measured vector y E M™, and the 
measurement matrix A G M™^". 

It is an elementary fact of hnear algebra that the reconstruction problem will not have a unique 
solution unless m > n. This observation is however challenged within compressed sensing. A 
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large corpus of research shows that, under the assumption that x is sparse, a dramatically smaller 
number of measurements is sufficient |Don06al ICRTOGal IDonOGb] . Namely, if only k entries of x 
are non- vanishing, then roughly ni > 2klog{n/k) measurements are sufficient for A random, and 
reconstruction can be solved efficiently by convex programming. Deterministic sensing matrices 
achieve similar performances, provided they satisfy a suitable restricted isometry condition |CT05] . 
On top of this, reconstruction is robust with respect to the addition of noise |CRT06b] IDMMlT] . i.e. 
under the model 



y = Ax + w, (2) 

with -say- w G a random vector with i.i.d. components Wi ~ N(0, o"'^). In this context, the 
notions of 'robustness' or 'stability' refers to the existence of universal constants C such that the 
per-coordinate mean square error in reconstructing x from noisy observation y is upper bounded by 

From an information-theoretic point of view it remains however unclear why we cannot achieve 
the same goal with far fewer than 2 klog{n/k) measurements. Indeed, we can interpret Eq. ([T]) as 
describing an analog data compression process, with y a compressed version of x. From this point 
of view, we can encode all the information about x in a single real number y G M (i.e. use m = 1), 
because the cardinality of M is the same as the one of M"'. Motivated by this puzzling remark, Wu 
and Verdii |WV10j introduced a Shannon-theoretic analogue of compressed sensing, whereby the 
vector X has i.i.d. components Xj ~ px- Crucially, the distribution px is available to, and may be 
used by the reconstruction algorithm. Under the mild assumptions that sensing is linear (as per 
Eq. Q), and that the reconstruction mapping is Lipschitz continuous, they proved that compression 
is asymptotically lossless if and only if 

m > n d{px) + o(n) . (3) 

Here d{px) is the (upper) Renyi information dimension of the distribution px- We refer to Section 



1.2 for a precise definition of this quantity. Suffices to say that, if px is e-sparse (i.e. if it puts mass 
at most £ on nonzeros) then d{px) < £• Also, if px is the convex combination of a discrete part 
(sum of Dirac's delta) and an absolutely continuous part (with a density), then d{px) is equal to the 
weight of the absolutely continuous part. 

This result is quite striking. For instance, it implies that, for random A;-sparse vectors, m > 
k + o(n) measurements are sufficient. Also, if the entries of x are random and take values in - 
say- {—10, —9, . . . , —9, +10}, then a sublinear number of measurements m = o{n), is sufficient! At 
the same time, the result of Wu and Verdii presents two important limitations. First, it does not 
provide robustness guarantee^ of the type described above. It therefore leaves open the possibility 
that reconstruction is highly sensitive to noise when m is significantly smaller than the number of 
measurements required in classical compressed sensing, namely Q{klog{n/k)) for A;-sparse vectors. 
Second, it does not provide any computationally practical algorithms for reconstructing x from 
measurements y. 



^ While this paper was about to be posted, we became aware of a paper by Wu and Verdii [WVllbj claiming that 
the boundary 5 = D{px){see below for a definition of D{px)) is achievable in principle by the Bayes minimum mean 
square error rule. Their result seems to be conditional on the validity of the replica method in this setting, which is 
not yet proved. 
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In an independent line of work, Krzakala et al. KMS"*"!!] developed an approach that leverages 



on the idea of spatial coupling. This idea was introduced for the compressed sensing literature by 



Kudekar and Pfister }KP10| (see [KRUllj and Section 1.5 for a discussion of earlier work on this 
topic). Spatially coupled matrices are -roughly speaking- random sensing matrices with a band- 
diagonal structure. The analogy is, this time, with channel coding]^ In this context, spatial coupling, 
in conjunction with message-passing decoding, allows to achieve Shannon capacity on memoryless 
communication channels. By analogy, it is reasonable to hope that a similar approach might enable 
to sense random vectors x at an undersampling rate m/n close to the Renyi information dimension 
of the coordinates of x, d{px)- Indeed, the authors of (KMS"'"llj evaluate this approach numerically 
on a few classes of random vectors and demonstrate that it indeed achieves rates close to the fraction 
of non-zero entries. They also support this claim by insightful statistical physics arguments. 

In this paper, we fill the gap between the above works, and present the following contributions: 

Construction. We describe a construction for spatially coupled sensing matrices A that is some- 



what broader than the one of KMS"'"llj and give precise prescriptions for the asymptotics of 



various parameters. We also use a somewhat different reconstruction algorithm from the one 
in IKMS"*"!!] . by building on the approximate message passing (AMP) approach of |DMM09l 



IDMMIO] . AMP algorithms have the advantage of smaller memory complexity with respect to 
standard message passing, and of smaller computational complexity whenever fast multiplica- 
tion procedures are available for A. 

Rigorous proof of convergence. Our main contribution is a rigorous proof that the above ap- 
proach indeed achieves the information-theoretic limits set out by Wu and Verdii [WVlOj . 
Indeed, we prove that, for sequences of spatially coupled sensing matrices {A{n)}n&i-, A{n) G 
]^m(n)xn ^i^^i asymptotic undersampling rate 5 = lim„_!.oo m{n)/n, AMP reconstruction is with 
high probability successful in recovering the signal x, provided 5 > d{px)- 

Robustness to noise. We prove that the present approach is robusl|^to noise in the following sense. 
For any signal distribution px and undersampling rate 5, there exists a constant C such that 
the output x{y) of the reconstruction algorithm achieves a mean square error per coordinate 
n~^E{||x(y) — xllll < Ca^. This result holds under the noisy measurement model pi) for a 
broad class of noise models for w, including i.i.d. noise coordinates Wi with Ejii;?} = < oo. 

Non-random signals. Our proof does not apply uniquely to random signals x with i.i.d. compo- 
nents, but indeed to more general sequences of signals {x(n)}„gN, x{n) £ M" indexed by their 
dimension n. The conditions required are: (1) that the empirical distribution of the coordinates 
of x(n) converges (weakly) to px', and (2) that ||a^(?T')||2 converges to the second moment of the 
asymptotic law px ■ 

Interestingly, the present framework changes the notion of 'structure' that is relevant for reconstruct- 
ing the signal x. Indeed, the focus is shifted from the sparsity of x to the information dimension 
d{px). 



2 Unlike [KMS+11| . we follow here the terminology developed within coding theory. 

^This robustness bound holds for all 5 > D{px), where D{px) = d{px) for a broad class of distributions px 
(including distributions without singular continuous component). When d(px) < D(px), a somewhat weaker robustness 
bound holds for d(px) < S < D{px). 
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In the rest of this section we state formany our resuhs, and discuss their imphcations and hmita- 
tions, as well as relations with earlier work. Section [2.3| provides a precise description of the matrix 
construction and reconstruction algorithm. Section [3] reduces the proof of our main results to two 
key lemmas. One of these lemmas is a (quite straightforward) generalization of the state evolution 
technique of [DMM091 IBMlla] . The second lemma characterizes the behavior of the state evolution 
recursion, and is proved in Section [5| The proof of a number of intermediate technical steps is 
deferred to the appendices. 

1.2 Formal statement of the results 

We consider the noisy model Q. An instance of the problem is therefore completely specified by the 
triple {x, w, A). We will be interested in the asymptotic properties of sequence of instances indexed 
by the problem dimensions S = {{x{n),w{n), A(n))}„gi^. We recall a definition from [BMllbj . (More 
precisely, |BMllbj introduces the B = 1 case of this definition.) 

Definition 1.1. The sequence of instances S = {x{n),w{n),A{n)}n(zn indexed by n is said to be 
a S-converging sequence if x{n) G M", w{n) G M™, A{n) £ i^"^x" with m = m{n) is such that 
■mjn — )• (5 G (0, oo), and in addition the following conditions hold: 

(a) The empirical distribution of the entries of x{n) converges weakly to a probability measure px 
on M with bounded second moment. Further X]r=i ^i(^)^ ~^ ^pxi-^'^} ■ 

(b) The empirical distribution of the entries of w{n) converges weakly to a probability measure pw 
on M with bounded second moment. Further X^i^i Wi{n)^ — )• Ep^yjVF^} = cr^. 

(c) If {ei\i<_i<n, G M" denotes the canonical basis, then limsupmaxjg[„] ||>l(n)ej||2 < B, 

n—^oo 

liminf minjgr„i ||yl(n)ei||2 > 1/B. 

We further say that {(x{n),w{n))}n>o is a converging sequence of instances, if they satisfy conditions 
(a) and (b). We say that {A(n)}„>o is a i? -converging sequence of sensing matrices if they satisfy 
condition (c) above. We say S is a converging sequence if it is B-converging for some B. 

Finally, if the sequence {{x{n),w{n), A{n))}n>o is random, the above conditions are required to 
hold almost surely. 

Notice that standard normalizations of the sensing matrix correspond to ||j4(n)ei||2 ~ 1 (and 
hence B = 1) or to ||j4(n)ei||2 ~ m{n)/n. Since throughout we assume m{n)/n — )• 5 G (0,oo), these 
conventions only differ by a rescaling of the noise variance. In order to simplify the proofs, we allow 
ourselves somewhat more freedom by taking B a fixed constant. 

Given a sensing matrix A, and a vector of measurements y, a reconstruction algorithm produces 
an estimate x{A] y) G of x. In this paper we assume that the empirical distribution px, and the 
noise level o"^ are known to the estimator, and hence the mapping x : {A, y) i— )■ x{A; y) implicitly 
depends on px and a^. Since however px-, o"^ are fixed throughout, we avoid the cumbersome notation 
x{A,y,px,(y'^). 

Given a converging sequence of instances S = {x{n),w{n), A{n)}n<mi and an estimator x, we 
define the asymptotic per-coordinate reconstruction mean square error as 

^ 1 11^ Il2 

MSE(5; x) = limsup — ||a;(A(n); y(n)) — x(n)|| . (4) 

n— >oo 'n 
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Notice that the quantity on the right hand side depends on the matrix A{n), which will be random, 
and on the signal and noise vectors x(n), w{n) which can themselves be random. Our results hold 
almost surely with respect to these random variables. In some applications it is more customary to 
take the expectation with respect to the noise and signal distribution, i.e. to consider the quantity 

MSE(cS;x) = limsup -E||x(A(n);y(n)) - x{n)\\^ . (5) 

It turns out that the almost sure bounds imply, in the present setting, bounds on the expected mean 
square error MSE, as well. 

In this paper we study a specific low-complexity estimator, based on the AMP algorithm first 
proposed in |DMM09] . This proceed by the following iteration (initialized with x\ = Ep^^X for all 
i e [n]). 

= ^^(^t + (Q^0A)v*), (6) 
r* = y- Ax^ + htQ r*"^ . (7) 

Here, for each t, rjt : M*^ — )• M" is a differentiable non-linear function that depends on the in- 
put distribution px- Further, rjt is separably namely, for a vector v G M", we have r]t{v) = 
ivi,tivi), ■ ■ ■ ,'r]n,t{vn))- The matrix G j^mxn ^^^^^^ ^^iq vector bj S M"^ can be efficiently computed 
from the current state x* of the algorithm, indicates Hadamard (entrywise) product and X* de- 
notes the transpose of matrix X . Further Q* does not depend on the problem instance and hence 
can be precomputed. Both Qt and bt are block-constants. This property makes their evaluation, 
storage and manipulation particularly convenient. We refer to the next section for explicit definitions 
of these quantities. In particular, the specific choice of r]i^t is dictated by the objective of minimizing 
the mean square error at iteration t + and hence takes the form of a Bayes optimal estimator for 
the prior px- In order to stress this point, we will occasionally refer to this as to the Bayes optimal 
AMP algorithm. 

We denote by MSEampC'?; cr^) the mean square error achieved by the Bayes optimal AMP al- 
gorithm, where we made explicit the dependence on cx^. Since the AMP estimate depends on the 
iteration number t, the definition of MSEampC'?; cx'^) requires some care. The basic point is that we 
need to iterate the algorithm only for a constant number of iterations, as n gets large. Formally, we 
let 

MSEamp(5;o-^) EE lim limsup -||x*(A(n);y(n)) - x(n)||^ . (8) 

As discussed above, limits will be shown to exist almost surely, when the instances (x(n), w{n), A{n)) 
are random, and almost sure upper bounds on MSEamp('5; cj^) will be proved. (Indeed MSEamp('5; o"^) 
turns out to be deterministic.) On the other hand, one might be interested in the expected error 

MSEamp(5;cj2) = lim limsup -E{ ||x*(A(n); y(n)) - x(n)f } . (9) 

We will tie the success of our compressed sensing scheme to the fundamental information-theoretic 
limit estabhshed in [ WVlOj . The latt er is expressed in terms of the Renyi information dimension of 
the probability measure px ■ 

^We refer to |DJM11| for a study of non-separables denoisers in AMP algorithms. 
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Definition 1.2. Let px be a probability measure over M, and X ~ px- The upper and lower 
information dimension of px are defined as 

d(px) = limsup=^Ml). (10) 
e^oo log ^ 

d(p^)=liminf^ffl^. (11) 

£-5-00 log^ 

Here H{ ■ ) denotes Shannon entropy and, forx G M, [x]i = [ix\/i, and [x\ = max{A: E Z : k < x}. 
If the limsup and liminf coincide, then we let d{px) = d{px) = d{px)- 

Whenever the hmit of H{[X]i) / logi exists and is finite, the Renyi information dimension can also 
be characterized as follows. Write the binary expansion X, X = DQ.D1D2D3 . . . with Di G {0, 1} 
for i > 1. Then d{px) is the entropy rate of the stochastic process {Di, -^3, . . . }. It is also 
convenient to recall the following result form |Ren59l IWVlOj . 

Proposition 1.3 ( [Ren59| IWVlOj ). Let px be a probability measure overM, and X Assume 
H{[X\) to be finite. If px = (1 ~ £)'^d + with a discrete distribution (i.e. with countable 
support), then d{px) < £■ Further, ifV has a density with respect to Lebesgue measure, then d{px) = 
d{px) = d{px) = In particular, if¥{X 7^ 0} < e then d{px) < £• 

In order to present our result concerning the robust reconstruction, we need the definition of 
MMSE dimension of the probability measure px- 

Given the signal distribution px, we let mmse(s) denote the minimum mean square error in 
estimating X ~ px from a noisy observation in gaussian noise, at signal-to-noise ratio s. Formally 

mmse(s)= inf ¥.{[X - r]{^s X + Z)f] , (12) 

where Z ^ N(0, 1). Since the minimum mean square error estimator is just the conditional expecta- 
tion, this is given by 

i2t 



mmse s 



E{[X-E[X|y]] }, Y = ^sX + Z. (13) 



Notice that mmse(s) is naturally well defined for s = 00, with mmse(oo) = 0. We will therefore 
interpret it as a function mmse : IR+ — t- M+ where M-(- = [0,oo] is the completed non- negative real 
line. 

We recall the inequality 

< mmse(s) < -, (14) 
s 

obtained by the estimator r/(y) = y/^/s. A finer characterization of the scaling of mmse(s) is provided 
by the following definition. 

Definition 1.4 ( |WVlla] ). The upper and lower MMSE dimension of the probability measure px 
over M are defined as 

L'(px) = hmsup s • mmse(s) , (15) 

s— >oo 

i2(px) = hill inf s • mmse(s) . (16) 
If the limsup and liminf coincide, then we let D{px) = D{px) = D{px)- 
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It is also convenient to recall the following result from [WVllaj . 
Proposition 1.5 f |WVlla ;l. IfE{X^} < oo, then 

D(px) < dipx) < d{px) < D{px). (17) 

Hence, if D{px) exists, then d{px) exists and D{px) = d{px)- In particular, this is the case if 
px = (1 — e)fd + £v with a discrete distribution (i.e. with countable support), and v has a density 
with respect to Lebesgue measure. 

We are now in position to state our main results. 

Theorem 1.6. Let px be a probability measure on the real line and assume 

6 > d{px). (18) 

Then there exists a random converging sequence of sensing matrices {A(n)}n>o, A(n) G R™^", 
m{n)/n — )• 6 (with distribution depending only on 5), for which the following holds. For any e > 0, 
there exists do such that for any converging sequence of instances {{x{n),w{n))}n>o with parameters 
{px,cr'^,6) and a G [0,o"o], we have, almost surely 

MSEAMp(5;a2) <e. (19) 

Further, under the same assumptions, we have M SEamp (5 ; cr^) < e. 

Theorem 1.7. Let px be a probability measure on the real line and assume 

S>D{px). (20) 

Then there exists a random converging sequence of sensing matrices {A{n)}n>o, A{n) G M™^", 
m{n)/n — )• 5 (with distribution depending only on 6) and a finite stability constant C = C{px,S), 
such that the following is true. For any converging sequence of instances {{x{n),w{n))}n>o with 
parameters {px,(t'^,S), we have, almost surely 

MSEAMp(5;a2) < C7cj2. (21) 
Further, under the same assumptions, we have MSEamp('5; cr^) < Ca'^. 



Notice that, by Proposition 1.5 D{px) > d{px), and D{px) = d{px) for a broad class of 
probability measures px, including all measures that do not have a singular continuous component 
(i.e. decomposes into a pure point mass component and an absolutely continuous component). 

The noiseless model ([l]) is covered as a special case of Theorem 1.6 by taking o"^ ^ 0. For the 
reader's convenience, we state the result explicitly as a corollary. 

Corollary 1.8. Letpx be a probability measure on the real line. Then, for any 5 > d{px) there exists 
a random converging sequence of sensing matrices {A{n)}n>o, A(n) G M*"^", m{n)/n — )• 5 (with 
distribution depending only on 5) such that, for any sequence of vectors {x(n)}n>o whose empirical 
distribution converges to px, the Bayes optimal AMP asymptotically almost surely recovers x{n) 
from m{n) measurements y = A{n)x{n) G M™*^"). (Namely, MSEamp('5; 0) = almost surely, and 
MSEamp(5;0) = 0.; 
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1.3 Discussion 



Theorem 1 1 . 6| and Corohary 1.8 are, in many ways, puzzhng. It is instructive to speU out in detail a 



few specific examples, and discuss interesting features. 

Example 1 (Bernoulli-Gaussian signal). Consider a Bernoulli-Gaussian distribution 

px = {'i-- e) 5o + e7At,<T 



(22) 



where 7^,cr(dx) = (27r(T^)~^/^ exp{ — (x — fi)"^ /{2a^)}dx is the Gaussian measure with mean // and 
variance cr^. This model has been studied numerically in a number o f pa pers, including |BSB19| 
IKMS"*"!! . By Proposition 1.3, we have d{px) = £, and by Proposition 1.5 D{px) = Dipx) = e as 



. By Proposition ] 
well. 

Construct random signals x{n) S M" by sampling i.i.d. coordinates x{n)i ~ px- Glivenko- 
Cantelli's theorem implies that the empirical distribution of the coordinates of x{n) converges almost 
surely to px, hence we can apply Corollary 1.8 to recover x{n) from m{n) = n£ + o{n) measurements 
y{n) G M™^"'). Notice that the number of non-zero entries in x{n) is, almost surely, k{n) = ne + o{n). 



Hence, we can restate the implication of Corollary 1.8 as follows. A sequence of vectors x{n) with 



Bernoulli-Gaussian distribution and k{n) nonzero entries can almost surely recovered by m(n) = 
k(n) + o(n) measurements. 

Example 2 (Mixture signal with a point mass). The above remarks generalize immediately 
to arbitrary mixture distributions of the form 



Px 



{l-e)6o + eq, 



(23) 



where ^ is a measure that is absolutely continuous with respect t o Le besgue measure, i.e. q{dx) = 
f{x)dx for some measurable function /. Then, by Proposition 1.3, we have d{px) = and by 
D{px) 



Proposition 



1.5 



'X, 



e as well. Arguing as above we have the following. 



Consequence 1.9. Let {x{n)}n>Q be a sequence of vectors with i.i.d. components x{n)i ~ px where 
Px is a mixture distribution as per Eq. (23). Denote by k{n) the number of nonzero entries in x{n). 
Then, almost surely as n ^ oo, Bayes optimal AMP recovers the signal x{n) from m[n) = k{n) + o{n) 
spatially coupled measurements. 



Under the regularity hypotheses of [WVlOj . no scheme can do substantially better, i.e. reconstruct 
x{n) from m(n) measurements if lim sup m(n)/ A: (n) < 1. 

ra— >oo 

One way to think about this result is the following. If an oracle gave us the support of x(n), we 
would still need m(n) > k{n) — o{n) measurements to r econ struct the signal. Indeed, the entries in 
the support have distribution g, and d{q) = 1. Corollary 1.8 implies that the measurements overhead 
for estimating the support of x{n) is sublinear, o(n), even when the support is of order n. 

It is sometimes informally argued that compressed sensing requires at least Q[k\og{n/k)) for 
'information-theoretic reasons', namely that specifying the support requires about nH{k/n) ~ 
k\og[n/k) bits. This argument is of course incomplete because it assumes that each measurement yi 
is described by a bounded number of bits. Lower bounds of the form m > C k log{n/k) are proved in 
the literature but they do not contradict our results. Specifically, |Wai091 lASZlOj prove information- 
theoretic lower bounds on the required number of measurements, under specific constructions for the 
random sensing matrix A. Further, these papers focus on the specific problem of exact support re- 
covery. The paper |RWY09] proves minimax bounds for reconstructing vectors belonging to ip-halls. 
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However, as the noise variance tends to zero, these bounds depend on the sensing matrix in a way 
that is difficult to quantify. In particular, they provide no explicit lower bound on the number of 
measurements required for exact recovery in the noiseless limit. Similar bounds were obtained for 
arbitrary measurement matrices in |CD11| . Again, these lower bounds vanish as noise tends to zero 
as soon as m{n) > k{n). 

A different line of work derives lower bounds from Gelfand' width arguments |Don06al IKT07j . 
These lower bounds are only proved to be a necessary condition for a stronger reconstruction guar- 
antees. Namely, these works require the vector of measurements y = Ax to enable recovery for all 
A;-sparse vectors x G M"". This corresponds to the 'strong' phase transition of fPTOS'. n3on06b| . and 
is also referred to as the 'for all' guarantee in the computer science literature [BGI+08]. 

The lower bound that comes closest to the present setting is the 'randomized' lower bound 
|BIPW10] . The authors use an elegant communication complexity argument to show that m{n) = 
0(fc(n) log(n//c(n))) is necessary for achieving stable recovery with an ii — £i error guarantee. This 
is a stronger stability condition than what is achieved in Theorem |1.7[ allowing for a more powerful 
noise process. Indeed the same paper also proves that recovery is possible from m{n) = 0{k{n)) 
measurements under stronger conditions. 

Example 3 (Discrete signal). Let be a fixed integer, oi, . . . ,aK G K, and {pi,p2, - ■ ■ ,Pk) 
be a collection of non-negative numbers that add up to one. Consider the probability distribution 
that puts mass pi on each Oj 

K 

Px=Y,PiK, (24) 



and let x{n) be a signal with i.i.d. coordinates x{n)i ~ px- By Proposition 1.3, we have d{px) = 0. 
As above, the empirical distribution of the coordinates of the vectors x{n) converges to px- By 
applying Corollary |1.8| we obtain the following 



Consequence 1.10. Let {x(n)}„>o he a sequence of vectors with i.i.d. components x{n)i ~ px 



where px is a discrete distribution as per Eq. (24) ■ Then, almost surely as n ^ oo, Bayes optimal 



AMP recovers the signal x{n) from m{n) = o{n) spatially coupled measurements. 

It is important to further discuss the last statement because the reader might be misled into too 
optimistic a conclusion. Consider any signal x G M". For practical purposes, this will be represented 
with finite precision, say as a vector of i-h\t numbers. Hence, in practice, the distribution px is 
always discrete, with K = 2^. One might conclude from the above that a sublinear number of 
measurements m{n) = o{n) is sufficient for any signal. 



This is of course too optimistic. The key point is that Theorem 1.6 and Corollary |1.8| are 
asymptotic statements. As demonstrated in [KMS"'"lT] . for some classes of signals this asymptotic 
behavior is already relevant when n is of the order of a few thousands. On the other hand, the 
same will not be true for a discrete signal with a large number of levels K (which is the case for an 
£-bit representation as in the above example, with i moderately large). In particular, a necessary 
condition in that case is n ^ iT. It would of course be important to substantiate/refine such a rule 
of thumb by numerical simulations or non-asymptotic bounds. 

Example 4 (A discrete-continuous mixture). Consider the probability distribution 

=e+5+i + e-'^-i + e9, (25) 
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where e+ + e_ + e = 1 and the probabihty measure q has a density with respect to Lebesgue 
measure. Again, let x{n) be a vector with i.i.d. components x{n)i ~ px- We can apply Corollary 
1.8 to conclude that m{n) = ne + o{n) spatially coupled measurements are sufficient. This should 



be contrasted with the case of sensing matrices with i.i.d. entries studied in |DT10j under convex 
reconstruction methods. In this case m(n) = n{l + e)/2 + o(n) measurements are necessary. 

In the next section we describe the basic intuition behind the surprising phenomenon in Theorems 



1.6 and 1.7 and why are spatially-coupled sensing matrices so useful. We conclude by stressing once 
more the limitations of these results: 

• The Bayes-optimal AMP algorithm requires knowledge of the signal distribution px- Notice 
however that only a good approximation of px (call it p^, and denote by X the corresponding 
random variable) is sufficient. Assume indeed that px and p^^ can be coupled in such a way 
that E{{X - Xf} < a^. Then 



X 



X + u 



(26) 



where 



< 



na 



;;2 



This is roughly equivalent to adding to the noise vector z further 'noise' 

degrades 



1.7 



z with variance a"^ /5. Indeed, it can be shown that the guarantee in Theorem 
gracefully as p^ gets different from px ■ 

Finally, it was demonstrated numerically in |VS1HIkMS"'"11 that, in some cases, a good 'proxy' 
for Px can be learnt through an EM-style iteration. 



As mentioned above, the guarantees in Theorems 1.6 and 1.7 are only asymptotic. It would 



be important to develop analogous non-asymptotic results. 



The stability bound (21) is non-uniform, in that the proportionality constant C depends on 



the signal distribution. It would be important to establish analogous bounds that are uniform 



over suitable classes of distributions. (We do not expect Eq. (21) to hold uniformly over all 
distributions.) 



1.4 How does spatial coupling work? 

Spatially-coupled sensing matrices A are -roughly speaking- band diagonal matrices. It is convenient 
to think of the graph structure that they induce on the reconstruction problem. Associate one node 
(a variable node in the language of factor graphs) to each coordinate i in the unknown signal x. 
Order these nodes on the real line R, putting the i-th node at location i G M. Analogously, associate 
a node (a factor node) to each coordinate a in the measurement vector y, and place the node a at 
position a/ 5 on the same line. Connect this node to all the variable nodes i such that Aai 7^ 0. If j4 
is band diagonal, only nodes that are placed close enough will be connected by an edge. See Figure [T] 
for an illustration. 

In a spatially coupled matrix, additional measurements are associated to the ffist few coordinates 
of X, say coordinates xi, . . . ,Xno with no much smaller than n. This has a negligible impact on the 
overall undersampling ratio as u/uq — )■ 00. Although the overall undersampling remains 5 < 1, 
the coordinates xi, . . . are oversampled. This ensures that these ffist coordinates are recovered 
correctly (up to a mean square error of order cr^). As the algorithm is iterated, the contribution 
of these first few coordinates is correctly subtracted from all the measurements, and hence we can 
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Additional measurements 
associated to the first few coordinates 

Figure 1: Graph structure of a spatially coupled matrix. Variable nodes are shown as circle and check nodes 
are represented by square. 



effectively eliminate those nodes from the graph. In the resulting graph, the first few variables are 
effectively oversampled and hence the algorithm will reconstruct their values, up to a mean square 
error of order a^. As the process is iterated, variables are progressively reconstructed, proceeding 
from left to right along the node layout. 

While the above explains the basic dynamics of AMP reconstruction algorithms under spatial 
coupling, a careful consideration reveals that this picture leaves open several challenging questions. 
In particular, why does the overall undersampling factor 6 have to exceed d{px) for reconstruction 
to be successful? Our proof is based on a potential function argument. We will prove that there 
exists a potential function for the AMP algorithm, such that, when 5 > d{px), this function has 
its global minimum close to exact reconstruction. Further, we will prove that, unless this minimum 
is essentially achieved, AMP can always decrease the function. This technique is different from the 
one followed in [KRUll] for the LDPC codes over the binary erasure channel, and we think it is of 
independent interest. 



1.5 Further related work 

The most closely related earlier work was already discussed above. 

More broadly, message passing algorithms for compressed sensing where the object of a number of 
studies studies, starting with |BSB19j . As mentioned, we will focus on approximate message passing 
(AMP) as introduced in jPMMOQl |DMM10| . As shown in jPJMllj these algorithms can be used in 
conjunction with a rich class of denoisers A subset of these denoisers arise as posterior mean 

associated to a prior px- Several interesting examples were studied by Schniter and collaborators 
ISchlOl ISchllllgPgTO] . and by Rangan and collaborators |KanllllKUHTT] . 

Spatial coupling has been the object of growing interest within coding theory over the last few 
years. The first instance of spatially coupled code ensembles were the convolutional LDPC codes of 
Felstrom and Zigangirov |FZ99j . While the excellent performances of such codes had been known for 
quite some time |SLJZ04j . the fundamental reason was not elucidated until recently |KRUllj (see also 
|LF10| ). In particular [KRUllj proved -for communication over the binary erasure channel (BEC)- 
that the thresholds of spatially coupled ensembles under message passing decoding coincide with the 
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thresholds of the base LDPC code under MAP decoding. In particular, this implies that spatially 
coupled ensembles achieve capacity over the BEC. The analogous statement for general memoryless 
symmetric channels remains open, but substantial evidence was put forward in |KMRU10] . The 
paper [HMUIO] discusses similar ideas in a number of graphical models. 

The first application of spatial coupling ideas to compressed sensing is due to Kudekar and Pfister 
|KP10j . Their proposed message passing algorithms do not make use of the signal distribution px , 
and do not fully exploit the potential of spatially coupled matrices. The message passing algorithm 
used here belongs to the general class introduced in |DMM09] . The specific use of the minimum- mean 
square error denoiser was suggested in |DMM1()] . The same choice is made in IKMS"*"!!] . 

Finally, let us mention that robust sparse recovery of fc-sparse vectors from m = 0{k log log(n//c)) 
measurement is possible, using suitable 'adaptive' sensing schemes |IPW11| . 



2 Matrix and algorithm construction 



In this section, we define an ensemble of random matrices, and the corresponding choices of Qt, bt, rjt 



that achieve the reconstruction guarantees in Theorems 1.6 and 1.7 We proceed by first introducing 
a general ensemble of random matrices. Correspondingly, we define a deterministic recursion named 



state evolution, that plays a crucial role in the algorithm analysis. In Section 2.3, we define the 
algorithm parameters and construct specific choices of Qt, bt, r]t- The last section also contains a 



restatement of Theorems 1.6 and 1.7, in which this construction is made explicit. 



2.1 General matrix ensemble 

The sensing matrix A will be constructed randomly, from an ensemble denoted by M {W, M, N) . The 
ensemble depends on two integers M, iV G N, and on a matrix with non- negative entries W S M^^*", 
whose rows and columns are indexed by the finite sets R, C (respectively 'rows' and 'columns'). The 
matrix is roughly row-stochastic, i.e. 

^ < ^ Wr,c < 2 , for all r G R . (27) 

We will let |R| = Lr and |C| = Lc denote the matrix dimensions. The ensemble parameters are 
related to the sensing matrix dimensions by n = NLc and m = ML^. 

In order to describe a random matrix A ~ M{W, M, N) from this ensemble, partition the columns 
and row indices in -respectively- Lc and Lr groups of equal size. Explicitly 

[n]=UecC(s), \C[s)\=N, 
[m] = U,6Ri?(r) , \R{r)\=M. 

Here and below we use [fc] to denote the set of first k integers [k] = {1, 2, . . . , A;}. Further, if z G R{r) 
or j G C(s) we will write, respectively, r = g(i) or s = g(j). In other words g( • ) is the operator 
determining the group index of a given row or column. 

With this notation we have the following concise definition of the ensemble. 

Definition 2.1. A random sensing matrix A is distributed according to the ensemble M{W,M,N) 
(and we write A ~ M{W,M,N)) if the entries {Aij, i G [rn\,j G [n]} are independent Gaussian 
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random variables with^ 

M 

2.2 State evolution 



^..-N(0,i-H^g(,),g(,.)). (28) 



State evolution allows an exact asymptotic analysis of AMP algorithms in the limit of a large number 
of dimensions. As indicated by the name, it bears close resemblance to the density evolution method 
in iterative coding theory |RU08j . Somewhat surprisingly, this analysis approach is asymptotically 
exact despite the underlying factor graph being far from locally tree-like. 

State evolution was first developed in jPMMOQ] on the basis of heuristic arguments, and substan- 
tial numerical evidence. Subsequently, it was proved to hold for Gaussian sensing matrices with i.i.d. 
entries, and a broad class of iterative algorithm in |BMllaj . These proofs were further generalized 
m [Ranllj . to cover 'generalized' AMP algorithms. 

In the present case, state evolution takes the following form. |^ 

Definition 2.2. Given W G M^""^ roughly row-stochastic, and 5 > 0, the corresponding state 
evolution maps T'^^ : — ^ M^, T'^ : — > M^, are defined as follows. For cp = (<Aa)aeR ^ 
i> = {'ipi)iec G we let: 

r^((/>)i = mmse(^Tyfe,i0,-i), (29) 

T'wWa = + (30) 

We finally define Jw = T'^y o T'^. 

In the following, we shall omit the subscripts from Tw whenever clear from the context. 
Definition 2.3. Given W £ ]^^r-xLc ^^y^ghiy 

row- stochastic, the corresponding state evolution se- 
quence is the sequence of vectors {(j){t),ilj{t)]t>o, 4>{t) = (0a(i))aeR ^ i^it) = iipiit))iec G 
defined recursively by (j){t) = T'^(?/;(t)), 'ip{t -|- 1) = T'^r{(j){t)), with initial condition 

il^iiO) = oo for all i e C . (31) 



Hence, for all t > 0, 



Mt)= + ^^w„,iV'i(t) 

^i(t + l)= mmse(^Y,Wb,^(|)^\t) 



(32) 



beR 



^As in many papers on compressed sensing, the matrix here has independent Gaussian entries; however, unhke 
standard practice, here the entries are of widely different variances. 

®In previous work, the state variable concerned a single scalar, representing the mean-squared error in the current 
reconstruction, averaged across all coordinates. In this paper, the dimensionality of the state variable is much larger, 
because it contains i/) an individualized MSE for each coordinate of the reconstruction and also (p a pseudo-data MSE 
for each measurement coordinate. 
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In words, and as implicit in Definition 2.3, = T^{(p) computes the formal MSE -0 of coordinates 
of X, for a specified level of formal MSE cp of coordinates of pseudo-data r. Similarly (p = T^{^p) 
computes the formal MSE in components of r given the formal MSE of coordinates of x. Section 4 
below gives an heuristic derivation of state evolution. 

2.3 General algorithm definition 

In order to fully define the AMP algorithm Q, ([T]), we need to provide constructions for the matrix 
Q*, the nonlinearities r]t, and the vector ht- In doing this, we exploit the fact that the state evolution 
sequence {(j){t)}t>o can be precomputed. 
We define the matrix by 

O* . = h(^)(ty' .33^ 

Lfc=i W^fc,g(i)0fc(*) 

Notice that is block-constant: for any r, s G [L], the block Q^j-,^) (^.^^-j has all its entries equal. 
As mentioned in Section [l| the function rjt : M" — t- is chosen to be separable, i.e. for v G M^: 

Vt{v) = {vt,i{^i)^Vt,2{v2), ■ ■ ■ ,Vt,N{vN)) ■ (34) 
We take ijt^i to be a conditional expectation estimator for X ~ px in gaussian noise: 

7]t,i{Vi)=HX\X + S^l^i){t)-^/^Z = Vi} , Sr{t) = ^Wu,rMt)~^ ■ (35) 

ueR 

Notice that the function r]t,i{-) depends on i only through the group index g{i), and in fact only 
parametrically through Sg(j)(f). 

Finally, in order to define the vector b*, let us introduce the quantity 

{Vt)u = ^ E + ((^* • (36) 

i£C{u) 

The vector b* is then defined by 

^'^-sE^^i^),A'iiuiVt-i)u, (37) 

where we defined Qjj = Q* ^ for i G R{r), j G C{u). Again b* is block-constant: the vector b^^^^ 
has all its entries equal. 

This completes our definition of the AMP algorithm. Let us conclude with a few computational 
remarks: 

1. The quantities 4'(t) can be precomputed efficiently iteration by iteration, because they are 
-respectively- Lr x Lc and L^-dimensional, and, as discussed further below, Lr,Lc are much 
smaller than m, n. The most complex part of this computation is implementing the iteration 



(32), which has complexity 0{{Lr + Lc) ), plus the complexity of evaluating the mmse function. 



which is a one-dimensional integral. 
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2. The vector b* is also block-constant, so can be efficiently computed using Eq. (37). 



3. Instead of computing (j){t) analytically by iteration (32), (f>{t) can also be estimated from data 
x*,r*. In particular, by generalizing the methods introduced in |DMM09l IMonl2] . we get the 
estimator 

Mt) = j^\\r'R(^a)\\l, (38) 

where r^^^-j = is the restriction of r* to the indices in R{a). An alternative more 

robust estimator, would be 

Mt)'^' = ^z^^\rUa)\(M/2), (39) 

where ^{z) is the Gaussian distribution function, and, for v G M^, 1^1 (£) is the i-th largest 
entry in the vector (|fi|, |t;2|, . . . , Ivii"!)- The idea underlying both of the above estimator is 
that the components of ?"^(^-) are asymptotically i.i.d. with mean zero and variance (pait) 

2.4 Choices of parameters 



In order to prove our main Theorem 1.6, we use a sensing matrix from the ensemble Ai{W,M,N) 
for a suitable choice of the matrix W S M'^^*^. Our construction depends on parameters p G M+, 
L, Lq £ N, and on the 'shape function' W. As explained below, p will be taken to be small, and 
hence we will treat 1/p as an integer to avoid rounding (which introduces in any case a negligible 
error) . 

Definition 2.4. A shape function is a function W : M — s- M+ continuously differentiahle, with 
support in [—1, 1] and such that j^W{u) dn = 1, and yV(— u) = W(u). 

We let C = {— 2p~^, . . . , 0, 1, . . . , L — 1}, so that Lc = L + 2/)^^. The rows are partitioned as 
follows: 



R = Ro U <! U 

<o = {— p^^, . . . , 0, 1, . . . , L — 1 + p^^}, and |Rj| = Lq. Hence Lr = Lc + 2p~^Lq. 
Finally, we take N so that n = NLc, and let M = NS so that m = MLr = N{Lc + 2p-^Lo)6. 
Notice that m/n = 6{Lc + 2p^^ Lq)/ Lc- Since we will take Lc much larger than Lq/ p, we in fact have 
m/n arbitrarily close to 5. 

Given these inputs, we construct the corresponding matrix W = VF(L,Lo, W,p) as follows 

1. For i £ {—2p~^, . . . , —1}, and each a £ Rj, we let Wa,i = 1. Further, Waj = for ah j £ C\{i}. 

2. For all a e Rq = {-p'^, . . . , 0, . . . , L - 1 + p-^}, we let 

Wa,i = pW{pia-i)) i£{-2p-\...,L-l}. (40) 
See Fig.[2]for an illustration of the matrix W. In the following we occasionally use the shorthand 

Wa-^ = pW{p{a-i)). 
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2p-' 



L + 2p ' 











i 


p 


W(p(a-0) 



Figure 2: Matrix W 



It is not hard to check that W is roughly row-stochastic. Also, the restriction of W to columns 
in Co is roughly column-stochastic. 

We are now in position to restate Theorem 1.6 in a more explicit form. 

Theorem 2.5. Let px be a probability measure on the real line with 6 > d{px), and /ei W : M — t- M+ 
be a shape function. For any e > 0, there exist Lq,L,p, to, ctq > such that Lq/{Lp) < e, and 
further the following holds true for W = W{L, Lq, W, p). 

For N > 0, and A{n) ~ M{W,M,N) with M = N6, and for all cj^ < a^, t > to, we almost 
surely have 

limsup— y(n)) — x(n)||^ < e . (41) 
Further, under the same assumptions, we have 

limsup-E{||x*(A(n); y{n)) - 2;(n)||^} < e . (42) 

Af-s>oo ^ 



In order to obtain a stronger form of robustness, as per Theorem |1.7[ we slightly modify the 
sensing scheme. We construct the sensing matrix A from A by appending 2p~^Lq rows in the 
bottom. 



A 



(43) 
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where / is the identity matrix of dimensions 2p^^Lq. Note that this corresponds to increasing the 
number of measurements; however, the asymptotic undersamphng rate remains 6, provided that 
Lo/{Lp) — )• 0, as n — )• oo. 

The reconstruction scheme is modified as follows. Let xi be the vector obtained by restricting 
X to entries in UjC(i), where i G {—2p~^, • • • , L — 2p~^ — 1}. Also, let X2 be the vector obtained 
by restricting x to entries in UjC(i), where i € {L — 2p~^, • • • ,L — 1}. Therefore, x = (xi,X2)"^. 
Analogously, let y = {yi, 7/2)"^ where yi is given by the restriction of y to Lli^RR{i) and y2 corresponds 
to the additional 2p~^Lq rows. Define wi and W2 from the noise vector w, analogously. Hence, 

- (^:) ^ (:) ■ 

Note that the sampling rate for vector X2 is one, i.e., 7/2 and X2 are of the same length and are 
related to each other through the identity matrix /. Hence, we have a fairly good approximation of 
these entries. We use the AMP algorithm as described in the previous section to obtain an estimation 
of xi. Formally, let x* be the estimation at iteration t obtained by applying the AMP algorithm. 
The modified estimation is then = {x\,y2)'^- 

As we will see later, this modification in the sensing matrix and algorithm, while not necessary, 
simplifies some technical steps in the proof. 

Theorem 2.6. Letpx be a probability measure on the real line with 5 > D{px), and /et W : M — t- M+ 
be a shape function. There exist Lq,L,p, to and a finite stability constant C = C{px,S), such that 
Lq/{Lp) < £, for any given e > 0, and the following holds true for the modified reconstruction 
scheme. 

For t >to, we almost surely have, 

limsup— ||x*(A(n);y(n)) — x(n)||^ < Ccr^. (45) 

Further, under the same assumptions, we have 

limsup-E{||x*(A(n);?/(n)) - x{n)f} < Ca"^. (46) 
Af-s>oo 



It is obvious that Theorems 2.5 and 2.6 respectively imply Theorems 1.6 and 1.7 We shall 



therefore focus on the proofs of Theorems |2.5| and 2.6 in the rest of the paper. 



3 Key lemmas and proof of the main theorems 

Our proof is based in a crucial way on state evolution. This effectively reduces the analysis of the 
algorithm ([6]), ([T]) to the analysis of the deterministic recursion ( |32[ ). 

Lemma 3.1. Let W G M^^^ be a roughly row- stochastic matrix (see Eq. (27))and (p{t), , ht be 



defined as in Section 2.3 Let M = M{N) be such that M/N — )• (5, as N ^ 00. Define m = MLr, 
n = NLc, and for each N > 1, let A{n) ~ Ai(yV, M, N) . Let {(x(n), ii;(n))}„>o be a converging 
sequence of instances with parameters (px,(^'^)- Then, for all t>l, almost surely we have 



limsup^||x^(i)(yl(n);y(n)) - xc(i)\\l = mmse( ^ iya,i(/>a ^(^ - 1)) • (47) 

for all i E C. 



N — - ■ .- 1-) 
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This lemma is a straightforward generahzation of ^BMlla] . Since a formal proof does not require 
new ideas, but a significant amount of new notations, it is presented in a separate forthcoming 
publication |BLM12j which covers an even more general setting. In the interest of self-containedness, 
and to develop useful intuition on state evolution, we present an heuristic derivation of the state 
evolution equations ( 32 ) in Section |4j 

The next Lemma provides the needed analysis of the recursion (32). 

Lemma 3.2. Let 5 > 0, and px be a probability measure on the real line. Let W : M — t- M+ be a 
shape function. 

(a) If 6 > d{px), then for any e > 0, there exist ao,p,L^ > 0, such that for any a"^ G [0,crg],Lo > 
3/J, and L > L^,, the following holds for W = Vl^(L, Lq, W, p).' 



lim — > ( 

t^oo L ^ 



^a{t) < e. 



(48) 



(6) If 5 > D{px), then there exist p,L^ > 0, and a finite stability constant C = C{px,S), such 
that for Lq > 3/S, and L > L^,, the following holds for W = W{L, Lo,W, p). 



lim — 

t—^oo L 



(49) 



The proof of this lemma is deferred to Section [5] and is indeed the technical core of the paper. 
Now, we have in place all we need to prove our main results. 



Proof (Theorem 2.5). Recall that C ^ {-2p~^ ■■■ ,L-1}. Therefore, 



limsup— ||x*(A(n);y(n)) — x{n) 



— lim sup — I 



- Lc ^ 




(50) 



Here, (a) follows from Lemma 3.1 (b) follows from the fact that (f>a{t) is nondecreasing in a for every 
t (see Lemma 5.9 below) and from the fact that W is roughly column-stochastic; (c) follows from 
the inequality mmse(s) < 1/s. The result is immediate due to Lemma 3.2, Part (a). 

The claim regarding the expected error follows readily since X has bounded second moment. □ 
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Proof (Theorem \2.(^ . The proof proceeds in a similar manner to the proof of Theorem 1.6 

limsup— ||5;*(j4(n); y(n)) — x(n)||^ 
7V-s>oo "n 









i=-2p~ 









hm 



sup 



^ 1 1 (Mn) ; y (n)) - xc(i) {n)f+ Irni^ ^ I ^'2 (n) 1 1 ^ } 



(51) 



< ^{ E '2*.(t - 1) + < c.', 

a=—n ^ 



where the last step follows from Part (6) in Lemma 3.1, and Part (6) in Definition |1.1[ 

Again, the claim regarding the expected error is immediate since X has bounded second moment. 



□ 



4 State evolution: an heuristic derivation 



This section presents an heuristic derivation of the state evolution equations (32 ). Our objective is to 
provide some basic intuition: a proof in a more general setting will appear in a separate publication 
|BLM12j . An heuristic derivation similar to the present one, for the special cases of sensing matrices 
with i.i.d. entries was presented in [BMllaj . 



Consider the recursion (32), and introduce the following modifications: (?) At each iteration, 
replace the random matrix A with a new independent copy A{t)] (ii) Replace the observation vector 
y with = A{t)xo + w; {Hi) Eliminate the last term in the update equation for r*. Then, we have 
the following update rules: 



= r^t{x' + {QtQA{t)Yr'), 
r* = y* - A{t)x^ , 



(52) 
(53) 



where A(0), ^4(1), ^(2), • • • are i.i.d. random matrices distributed according to the ensemble A^(VF, M, N), 
i.e., 



Rewriting the recursion by eliminating r*, we obtain: 

x'^^ = VtiiQt A{t)ry' + (/ - (Qt © A{t)rA{t))x') 
= Vt{xo + {Qt A{t)yw + B{t){x' - xo)) , 



(54) 



(55) 



where B{t) = I - {Qt A{t))*A{t) G M"^". Note that the recursion ([55]) does not correspond 
to the AMP update rules defined per Eqs. ^ and ([T]). In particular, it does not correspond to 
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any practical algorithm. However, it is much easier to analyze, as it allows to neglect completely 
correlations induced by the fact that we should use the same sensing matrix A across different 
iterations. Also, it is useful for presenting the intuition behind the AMP algorithm and to emphasize 
the role of the term bt in the update rule for r*. As it emerges from the proof of jBMllaj . this 
term does asymptotically cancels dependencies across iterations. 

By virtue of the central limit theorem, each entry of B{t) is approximately normal. More specifi- 
cally, Bij{t) is approximately normal with mean zero and variance (1/Af) J2reR^r,g{i)Wr,%{j)Qr,g(i)j 



for i,j E [n]. Define ft{s) = limAr^oo 



xc(s)\\ for s G C. It is easy to show that distinct 



entries in B{t) are approximately independent. Also, B{t) is independent of {B{s)}i<s<t-i, and in 
particular, of x* — xq. Hence, B{t){x^ — xq) converges to a vector, say v, with i.i.d. normal entries, 
and for i G [n]. 



N 



(56) 



sec reR 



Conditional on tf, {Qt A[t))*w is a vector of i.i.d. normal entries with mean 0. Also, the 
variance of its i^^ entry, for i G [n], is 



(r)lP, 



(57) 



reR 



reR 



which converges to a^, by the law of large numbers. With slightly more work, it can be shown that 
these entries are approximately independent of the ones of B{t){x^ — xq). 

Summarizing, the i^^ entry of the vector in the argument of r]t in Eq. (55) converges io X + 
Tt{g{i))Z with Z ~ N(0, 1) independent of X, and 



■uec reR 



(58) 



lim — 



for s G C. In addition, using Eq. (55) and invoking Eqs. (34), (35), each entry of — ^c^s) 

converges to rjt^siX + Tt{s)Z) — X, for s G C. Therefore, 



lim — ||x*+\ 



Xc(s)\ 



HVltA^ + rt{s)Z) - Xf} = mmse(r,-2(s)). 



(59) 



Using Eqs. (58) and (59), we obtain: 



reR 



r,s \ / 

uec 



Wr.u mmse(r( ^(^^)) 



(60) 



Applying the change of variable t^'^{u) = 'YlibeR^b,u4>h'^{i)^ and substituting for Qr,s-, from 
Eq. (33), we obtain the state evolution recursion, Eq. (32). 

In conclusion, we showed that the state evolution recursion would hold if the matrix A was re- 
sampled independently from the ensemble M{W, M, N), at each iteration. However, in our proposed 
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AMP algorithm, the matrix A is constant across iterations, and the above argument is not vahd 
since x* and A are dependent. This dependency cannot be neglected even in the large system limit 
— )• oo. Indeed, the term bt r*~^ in the update rule for r* (which was removed in our argument 
above) leads to an asymptotic cancellation of these dependencies as in |BMllaj . 



5 Analysis of state evolution 

Throughout this section px is a given probability distribution over the real line, and X ~ px- Also, 



we will take o" > 0. The result for the noiseless model (Corollary 1.8) follows by letting o" J, 0. Recall 
the inequality 

mmse(s) < min(Var(X), -) . (61) 

s 

Definition 5.1. For two vectors 4>, 4> ^ M^, we write ip ^ (p if all (pr > 4>r for r G {1, . . . , K}. 

Proposition 5.2. For any W S M^^*^, the map Tw ■ — ^ is monotone; i.e., if (p h (p then 
Tiy(0) h Tiy ((/)). Analogously, T'^^ and T'^ are also monotone. 

Proof. It follows immediately from the fact that s i— mmse(s) is a monotone decreasing function. □ 

Proposition 5.3. The state evolution sequence {(p{t),ip{t)}t>o with initial condition ipi{0) = oo, for 
i ^ C, is monotone decreasing, in the sense that (p{0) ^ </>(l) ^ 0(2) ^ . . . and ip{0) ^ "0(1) ^ V'(2) ^ 

Proof. Since ipi{0) = oo for all i, we have ip{^) ^ i^i^)- The thesis follows from the monotonicity of 
the state evolution map. □ 

Lemma 5.4. Assume 5Lq > 3. Then there exists to (depending only onpx), such that, for allt > to 
and all i G {—2p~^, . . . , —1}, a E Ri, we have 

iiiit) < mmse(^) < ^ , (62) 

Proof. Take i G {-2p^^,--- ,-1}. For a £ Ri, we have (pa{t) = a"^ + {l/6)ipi{t). Further from 
mmse(s) < 1/s, we deduce that 

Mt + '^) = mmse(^Tyb,,0,-i(t)) < ( ^^^^^.-^(t) 

aeR, ° 

Substituting in the earlier relation, we get^pi{t+l) < {1/ Lo){a^ + (l/5)ipi{t)). Recalling that 5Lo > 3, 
we have Tpi{t) < 2(T^/Lo, for all t sufficiently large. Now, using this in the equation for (pa{t), a E Rj, 
we obtain 



ha{t) = a' + -^i{t)<[l + —y'. (65) 
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We prove the other claims by repeatedly substituting in the previous bounds. In particular, 
ibi{t) = mmse(^^Wb,i^^\t - 1)^ < mmse( Wa,i^aHt)^ 



(66) 



where we used Eq. (65) in the penultimate inequality. Finally, 

1 , ..s . 9 1 / Lo_\ 



^t) <a' + -i;^{t) <<j' + -mmse(^^j , (67) 



where the inequality follows from Eq. (66). □ 

Next we prove a lower bound on the state evolution sequence. Here and below Co = C \ 
{-2p-\...,-l} ^ {0, ...,L - 1}. Also, recall that Rq = . . . , 0, . . . , L - 1 + p-^}. (See 

Fig.§. 

Lemma 5.5. For any t>Q, and any i £ Co, il^i{t) > mmse(2o"~^). Further, for any a G Rq and any 
t>0 we have (pait) > + (25)-^ mmse{2a^) . 

Proof. Since 4>a{t) > cr"^ by definition, we have, for i> 0, ipi{t) > mmse((T~^ Sfe^bi) ^ mmse(2(T~^), 
where we used the fact that the restriction of W to columns in Co is roughly column-stochastic. 
Plugging this into the expression for (pa, we get 

Mt) >(t^ + ]Y^ Wa,i mmse(2(j-2) > + ^mmse{2a-'^) . (68) 
■^-^ 2o 

ieC 

□ 

Notice that for Lo,* > 4 and for all Lq > Lq,*, the upper bound for V'j(i)) £ {~2/9~^, • • • , — 1}, 
given in Lemma 5.4 is below the lower bound for tpi{t), with i E Co, given in Lemma 5.5; i.e. for all 



mmse^^j < mmse^-^ j . (69) 

Motivated by the above, we introduce modified state evolution maps F'^r : M^" — M^°, F'^^ : 
M^o ^ M^o, by letting, for = (0a)aeRo ^ ^+ > i' = (V'OieCo e M^«, and for aU i € Co, a e Rq: 

FV(0)i = mmse( VFfe_i</>^-i) , (70) 
feeRo 

^wWa = a^ + ^YWa-i^P^. (71) 

where, in the last equation we set by convention, ipi{t) = mmse(Lo/(2(7^)) for i < —1, and ipi = oo 
for i > L. We also let fw = ^'w ° ^w- 
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Definition 5.6. The modified state evolution sequence is the sequence 'i/'(i)}t>o with (j){t) = 

F'^(V'(t)) and ip{t + 1) = F'y(^(0(t)) for all t > 0, and tpiiO) = oo for all i G Cq. We also adopt the 
convention that, for i> L, ilJi{t) = +oo and for i < —1, ipi{t) = mmse(Lo/(2cr'^)), for all t. 



Lemma 5.4 then implies the following. 



Lemma 5.7. Let {(f){t),ip{t)}t>o denote the state evolution sequence as per De finit ion 2.3, and 
{(j)"^°'^{t),il)"^°'^{t)}t>o denote the modified state evolution sequence as per Definition 5.6. Then, there 
existsto (depending only onpx), such that, for allt > t^, (j){t) < — to) and'tp{t) ^ ip'^°'^{t—to). 



Proof. Choose to = t{Lo, ^) as given by Lemma 5.4 We prove the claims by induction on t. For the 

Mto) < mmse(Lo/(2CT2)) = i^f^'^iO), for i < -1. 



5.4 



induction basis (t = to), we have from Lemma 
Also, we have ijf°'^{0) = oo > '0j(io), for i > 0. Further, 

C°^(o) = F'^(V''"°^(o)), > r^(v'^°<^(o)), > r^(v(to))a = Uto), 



(72) 



for a G Rq. Here, the last inequality follows from monotonicity of T'^ (Proposition 5.2). Now, 
assume that the claim holds for t; we prove it for t + 1. For z G Co, we have 



V'r'(i + l-to) = F'H.(0'"°^t-to)) 



it-to))i 



(73) 



where the inequality follows from monotonicity of T'^r (Proposition 5.2 ) and the induction hypothesis. 
In addition, for a G Rq, 



C°^(t + 1 - to) = F'4.(V'"°<^(t + 1 - to))a > T'^(V'"°''(i + 1 - to))a 
>r^(V(t + l))a = 0a(t + l). 



Here, the last inequality follows from monotonicity of T'^ and Eq. (73). 



(74) 



□ 



By Lemma 5.7, we can now focus on the modified state evolution sequence in order to prove 
Lemma 3.2 Notice that the mapping Fw has a particularly simple description in terms of a shift- 



invariant state evolution mapping. Explicitly, define T'^^ : 
for (p,ip and all i,a£Z: 



^ ' ' W,oo 



T'^,oo(V')a 



mmse 



by letting, 

(75) 
(76) 



iez 

Further, define the embedding H : M*^" — )• by letting 

mmse(Lo/(2o-2)) if i < 0, 



(HV')^ = 

And the restriction mapping ^ : 



+ 00 



pb~a+l 



if < f < L - 1, 
if i > L, 

by Kb^ = (V'a,---,V'f))- 



(77) 
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Lemma 5.8. With the above definitions, fw = Hq^_^ o Tw^^o ° H. 

Proof. Clearly, for any ip = (^/;j)jgCo) we have T'^ o H{ip)a = F'^^ o H(-(/')a for a S Rq, since the 
definition of the embedding H is consistent with the convention adopted in defining the modified 
state evolution. Moreover, for i E Co = {0, . . . , L — 1}, we have 



6eZ -p-i<6<L-l+p-i 

= mmse(^ ^ Wb^iCp^^^ = fw{(l))i- 
beRo 



(78) 



Hence, 'Tw,oo°'^w,oo°^Wi = ^w°^w°^Wi^ ^ ^ ^o- Therefore, h\Q L_ioJw,oo°^W = fw°^{ip), 
for any £ M^", which completes the proof. □ 



We will say that i/j G M is nondecreasing if, for every l<i<j<K,ipi<'ij;j. 

Lemma 5.9. If if) £ M'^" is nondecreasing, with ipi > mmse(-Lo/(2(T^)) for all i, then Vwi^tp) is 
nondecreasing as well. In particular, if {4>{t),'il}{t)}t>o is the modified state evolution sequence, then 
(j){t) and 'ip{t) are nondecreasing for all t. 



Proof. By Lemma 5.8, we know that fw = Hq ^_]^oTvk,oo°H. We first notice that, by the assumption 
V'i ^ mmse(Lo/(2cj^)), we have that H('0) is nondecreasing. 

Next, if -0 G is nondecreasing, T\Y,oo{ip) is nondecreasing as well. In fact, the mappings T'^^^ 
and f'lY^ both preserve the nondecreasing property, since both are shift invariant, and mmse( • ) is 
a decreasing function. Finally, the restriction of a nondecreasing vector is obviously nondecreasing. 

This proves that fw preserves the nondecreasing property. To conclude that ip{t) is nondecreasing 
for all t, notice that the condition 'ipi{t) > mmse(Lo/(2cj^)) is satisfied at all t by Lemma 5.5 and 



condition (69). The claim for ^(t) follows by induction. 

Now, since f'^ preserves the nondecreasing property, we have (/)(t) = F'(^(V'(t)) is nondecreasing 
for all t, as well. □ 

5.1 Continuum state evolution 

We start by defining the continuum state evolution mappings. For i7 C M, let ^{0,) be the space of 
non-negative measurable functions on 0, (up to measure-zero redefinitions). Define : ^{[—1,£ + 
1]) ^ ^([0,£]) and J^^ : ^([0,^]) ^ ^{[-1,£ + 1]) as follows. For G ^([-1,£ + 1]),?/; G 
^([0,£]), and for all x G [0,£],?/ G [-1,£+1], we let 

i+i 



J-{^(</,)(x) = mmse( J W(x - z)0-^(z)dz), (79) 

J'wWiy) = cy^ + \ I my- x)ij{x)dx , (80) 



5 

where we adopt the convention that iJj{x) = mmse(Lo/(2cj^)) for j; < 0, and iIj{x) = oo for x> I. 

Definition 5.10. The continuum state evolution sequence is the sequence {(/)( •; t), ■(/;(• ; i)}t>0; with 
(j){t) = F!^{ilj{t)) and tp{t + 1) = J"(v(0(t)) for all t > 0, and ij{x; 0) = oo for all xe[0,£]. 
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Recalling Eq. we have ij{x;t) = J='l^{4>{t - l))(x) < Var(X), for t > 1. Also, (l){x;t) = 

•^w(V'(*))(a;) <(^^ + (l/(5)Var(X), for t > 1. Define, 

$M = 1 + 7Var(X). (81) 



Assuming a < 1, we have (j){x;t) < ^m, for all t>l. 

Lemma 5.11. Let ■ ■,t),ip{- ; t)}t>o be the continuum state evolution sequence and {(j){t), ip{t)}t>o 
be the modified discrete state evolution sequence, with parameters p and L = i/p. Then for any t >0 

L-l 



lim y =0, (82) 

^ L-p-i-l 

lim - V \Mt) - (t>{pa;t)\ = . (83) 



Lemma 5.11 is proved in Appendix [A| 

Corollary 5.12. The continuum state evolution sequence ■ ]t),ip{ - ; t)}t>o, with initial condition 
ip(x) = mmse(Lo/(2(T^)) for x < 0, and ip{x) = oo for x > i, is monotone decreasing, in the sense 
that (j){x; 0) > 0(x; 1) > 0(x; 2) > • • • and ij{x; 0) > i){x;l) > il;{x;2) > ■ ■ ■ , for all x G [0, £] . 

Proof. Follows immediately from Lemmas |5.3| and |5.11[ □ 

Corollary 5.13. Let {(/){■ ■,t),ip{- ;t)}t>2 be the continuum state evolution sequence. Then for any 
t, X ^ Tp{x;t) and x i— )• 4>{x:,t) are nondecreasing Lipschitz continuos functions. 

Proof. Nondecreasing property of functions x i— )• ip{x;t), and x i— )• (j){x;t) follows immediately from 



Lemmas 5.9 and 5.11 Further, since ip{x; i) is bounded for t > 1, and W( • ) is Lipschitz continuos, 
recalling Eq. (80), the function x i— )• (j){x]t) is Lipschitz continuos as well, for t>l. Similarly, since 
0"^ < 4>{x; t) < <l>jv/, invoking Eq. (|79|), the function x i— t- 'tp{x; t) is Lipschitz continuos for t > 2. □ 



Free Energy. We define the mutual information between X and a noisy observation of X at 
signal-to-noise ratio s by 

\is) = LiX;V^X + Z), (84) 
with Z ~ N(0, 1) independent of X ~ px- Recall the relation [GSV05] 

^ l(s) = - mmse(s) . (85) 
as 2 

Furthermore, the following identities relate the scaling law of mutual information under weak noise 
to Renyi information dimension jWVTTaj. 

Proposition 5.14. Assume H{[X\) < oo. Then 

\(s) 

lim inf y- = d{px), 

s-5>oo i log S 

Us) - 
hmsupy- = d{px)- 

S-5-00 2 fog S 
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A key role in our analysis is played by the free energy functional. 

Definition 5.15. Let W(.) he a shape function, and a,6 > be given. The corresponding free energy 
is the functional Ew : +1]) — )• M defined as follows for (p G ^([—1, £ + 1]); 

EM<P) = ^J ^ {^+log0(x)}dx + y^ l(y W(x-z)0-i(^)dz)dx, (87) 

where 

a=(x) = .^ + l(^^^W(y-x)d,)mmse(|?,). 

Viewing Eyy as a function defined on the Banach space L2{[—l,i]), we will denote by VEy^{(/)) its 
Frechet derivative at (j). This will be identified, via standard duality, with a function in L2([— 1,^]). 
It is not hard to show that the Frechet derivative exists on {(/) : (j){x) > o"^} and is such that 

VEw(0)(y) = ^^{<A(y)-^'(y)- J^'w(x-y)mmse( j W(x - z)0-i(z)dz)dx}, (89) 
for -1 < y < £ - 1. 

Corollary 5.16. If is the fixed point of the continuum state evolution, then VEw{4>){y) = 0, 

for -l<y<l-l. 

Proof. We have (j) = ^w^'^) ^^"^ ~ •^vv('^)' whereby for— l<y<£— 1, 
(t>{y) = 0-^ + 7 / yV{y - x)'tlj{x)dx 



(90) 



6 „ 



+ \( 1 yV(y — a;)dx) mmse 
- J >V(y — x)mmse^y yV{x — z)(j) ^{z)dzjdx 
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- J W(y — x)mmse^y W{x — z)(f) ^{z)dzjdx. 



The result follows immediately from Eq. ( 89 ) . □ 
Definition 5.17. Define the potential function V : M+ — )• M+ as follows. 

V{cP) = ^-[^ + log<p)+\{<p-'). (91) 



Using Eq. (86), we have for <C 1, 



A 2 1 

v{<p) < -{j + logcp) + -d{px) log(r') 

= ^ + ^[^-^(Px)]log(0). 



(92) 
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Define 



2,1 / -^^0 \ 



(93) 



Notice that cr^ < 0* < (1 + 2/{5Lo))a'^ < 2^2, given tliat 6Lq > 3. The following proposition upper 
bounds V{(t)*) and its proof is deferred to Appendix [B| 



Proposition 5.18. There exists 02 > 0, such that, for a G (0, £12], we have 

5 6-d{px) 



V{4>1 < 2 + 



log(2a^). 



Now, we write the energy functional in term of the potential function. 
Ew(</') = 

with, 



VicPix)) dx + - / dx + Ew(</'), 



1 



4>{x) 



Ew(0) 



{\{W*cp-\y))-\{r\y-l))}dy. 



(94) 



(95) 



(96) 



Lemma 5.19. Let 5 > 0, and px be a probability measure on the real line with 5 > d{px)- For any 
K > 0, there exist £0, Gq, such that, for any i > £0 O'^d a G (0, ao], and any fixed point of continuum 
state evolution,{(j),ip} , with tp and cp nondecreasing Lipschitz functions andtp{x) > mmse(Lo/(2a"^)), 
the following holds. 



(x) — 0*1 dx < ni. 



(97) 



Proof. The claim is trivial for k > $Af, since (j){x) < ^m- Fix k < ^m, and choose cJi, such that 
(j)* < k/2, for a G (0, fii]. Since (p is a fi xed p oint of continuum state evolution, we have VEy^;{(j)) = 0, 
on the interval [—1,£— 1] by Corollary 5.16 Now, assume that /f^^ 10(2;) ~4'*\ > We introduce 
an infinitesimal perturbation of (p that decreases the energy in the first order; this contradicts the 
fact VEw(0) = on the interval [—!,£ — 1]. 

Claim 5.20. For each fixed point of continuum state evolution that satisfies the hypothesis of 



Lemma 5.19, the following holds. For any K > 0, there exists £q, such that, for £ > £q there 
exist xi < X2 G [0,£ — 1), with X2 — xi = K and k/2 + (p* < (p{x), for x G [xi,X2]. 



Claim 5.20 is proved in Appendix [Q 

Fix K > 2 and let xq = (xi + X2)/2. Thus, xq > 1. For a G (0, 1], define 



(paix) 



cPix) 



X2-X0 



ax2 



(p{x - a), 



X2-xo-a' 



for X2 < X, 
for X G [xo + a, X2), 
for X G [— 1 + a, xq + a), 
for X G [-1, -1 + a). 



(98) 



See Fig. [3] for an illustration. (Note that from Eq. (80), 
the difference of the free GiiGrgics of functions (j) and (pa- 



-I) 



In the following, we bound 
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-1 -1 + CI 



l-l 



Figure 3: An illustration of function 4>{x) and its perturbation (j>a{x)- 



Proposition 5.21. For each fixed point of continuum state evolution, satisfying the hypothesis of 

} dx < C{K)a. 



Lemma 5.19, there exists a constant C{K), such that 

-1 '(72(x) a2(x)-a2 



4>a{x) 



We refer to Appendix [D] for tlie proof of Proposition 5.21 



Proposition 5.22. For each fixed point of continuum state evolution, satisfying the hypothesis of 
Lemma 5.19, there exists a constant C{k,K), such that, 

Ew(0a)-Ew(0) <C(K,K)a. 



Proof of Proposition 5.22| is deferred to Appendix |Ej 
Using Eq. (95) and Proposition 5.22, we have 

E>v((/'a) - Eyi; 



< 



£-1 



-1 



{V{Mx)) - V{ct)ix))}dx + C(k, K)a, 



where the constants {5/2)C{K) and C{k,K) are absorbed in C{k,K). 
In addition, 



t-i 



{V{M^))-V{(t>{x))}dx= / {Vi4>a{x))-V{^{x))}dx 



+ 



+ 



+ 



X2 



xo+a 
xo+a 



X2 



V{(f)aix))dx- / V{(l){x))dx 



xo 



viMx))dx 



l+a 
1+a 



xo 



V{(l){x))dx 



V{Mx))dx. 



(99) 



(100) 
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Notice that the first and the third terms on the right hand side are zero. Also, 



2-' 2 



XQ+a 



^2 



V{(l)a{x))dx- / V{(t>{x))dx 



a 



Xo 
-l+a 



X2 - Xo 



X2 



V{^{x))dx, 



xo 



(101) 



V{Mx))dx = aV{(l)*). 



-1 



Substituting Eq. (101) in Eq. (100), we get 



{V{Mx)) - V{cl){x))}dx = 

I X2 — Xq 



We proceed by proving the fohowing claim. 



X2 



{V{cl>*)-V{<P{x))}dx. 



(102) 



xo 



Claim 5.23. For any C = C{k,K) > 0, there exists as, such that for a € (0,(73], the following 
holds. 



{ViMx)) - Vi(t>{x))}dx < -2C7(k, K)a. 



Proof. By Proposition 5.18 we have 



(103) 



(104) 



for a £ (0,cJ2]. Also, since (p{x) > k/2 for x G [xo,X2], we have V{4>{x)) > ((5/2)log(/) > 
(5/2)log(K/2). Therefore, 



I f\v{Ux))-V{<t^{x))] 



dx 



2{X2 - Xo) 



^2 

{V{<P*)-V{m)]dx 



Xo 



< 



6_ ^ 6 - d(px 
2 



+ log(2a2) - ^ log 



-log(-) 
2 ^2' 



(105) 



It is now obvious that by choosing cjs > small enough, we can ensure that for values a E (0,(73], 

(106) 



5 ^ (5 - d{px) 
2 



+ ^ log(2a2) - ^ log(^) < -2C{k, K)a. 



(Notice that the right hand side of Eq. (106) does not depend on a). 



□ 



Let do = iiiiii{(Ti, CT2, CT3}. As a result of Eq. (99) and Claim 5.23 



Ew(<Aa) - Ew(0) < 



{y((/.„(x)) - y(0(x))}dx + C(k, K)a 



(107) 



< -C{K,K)a. 



Since </) is a Lipschitz function by assumption, it is easy to see that {{(pa — ^'{{2 < C a, for some 
constant C. By Taylor expansion of the free energy functional around function (p, we have 



(VEw((/'), (pa-^) = ^wiM - Ew(0) + 0{{{(pa 

< -C{K,K)a + o{a). 



(108) 
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However, since {4>,ip} is a fixed point of the continuum state evolution, we have VEyv((/)) = on 



the interval [—1,£— 1] (cf. Corollary 5.16). Also, (j)a — 4> is zero out of [—1,1— 1]. Therefore, 



(VEw;((/)), 0a — <p) = 0, which leads to a contradiction in Eq (108). This implies that our first 
assumption fi-^^ {(/^i^) — dx > k£ is false. The result follows. □ 

Next lemma pertains to the robust reconstruction of the signal. Prior to stating the lemma, we 
need to establish some definitions. Due to technical reasons in the proof, we consider an alternative 



decomposition of Ew((/') to Eq. (95). 



Define the potential function Vj-o\, : M+ — )• M+ as follows. 



c 2 

Vroh{cp) = - (^+log<P), (109) 



and decompose the Energy functional as: 

Vrol^icPix)) dx+- ^"Vt'." dx + Ew,rob(0), (HO) 

-1 2 7_i (l){x) 

with, 

Ew,rob('/') = / \{W*r\y))dy. (Ill) 







Lemma 5.24. Let 5 > and px be a probability measure on the real line with 6 > D[px)- There 
exist £o, (Tq, and C, such that , for any i > £o and a G (0, ao], and for any fixed point of continuum 
state evolution, withip and (f) nondecreasing Lipschitz functions andip{x) > mmse(Lo/(2(T^)), 

the following holds. 

I (/>(x) -(/>*! dx < Ca^l (112) 



5.19 



Proof. Suppose \4^{x) — i;^>*|dx > Ca'^i, for any constant C. Similar to the proof of Lemma 
we obtain an infinitesimal perturbation of (p which decreases the free energy in the first order, 
contradicting the fact VEw(0) = on the interval [-!,£ — 1]. 

By definition of upper MMSE dimension (Eq. (|15|)), for any e > 0, there exists 0i, such that, for 
[0,(/.i], 

mmse((/>-i) < (D{px) + £)</>. (113) 
Claim 5.25. For each fixed point of continuum state evolution that satisfies the hypothesis of 



Lemma 5.24, following holds. For any K > 0, there exists Iq, such that, for £ > £q there 
exist xi < X2 € [0,i — 1), with X2 — xi = K and < 0(x) < for x G [xi,X2]. 



Claim 5.25 is proved in Appendix [Fj For positive values of a, define 




for X < xi,X2 < X, 
for X G (xi, X2). 

Our aim is to show that Eyv((/)a) — Eyv(0) < — c a, for some constant c > 0. 
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Invoking Eq. (95), we have 



1 



+ 



(115) 



dx + Ew,rob((Aa) - Ew.rob (</>)• 



The following proposition bounds each term on the right hand side separately. 
Proposition 5.26. For the function (j){x) and its perturbation (paix), we have 



' Vrob(0a(x)) - < log(l - a) + K 

e-1 



5a 
C{l-a) 



{a\x)-a') 



1 



1 



1 



[X] 



[X 



dx< K 



2a 



C(l-a)' 



Ew,rob(<Aa) - Ew,rob(<^) < - ^^^^j ^ ^ {K + 2) log(l - a) 



(116) 
(117) 
(118) 



We refer to Appendix [G] for the proof of Proposition 5.26 



Combining the bounds given by Proposition 5.26[ we obtain 



-WWa) 



< - log(l -a){S- {D{px) + £)(1 + -)} + K— 



25a 



(119) 



Since 5 > D{px) by our assumption, there exist e,K,C such that 

c = 5- (D{px) + e)(l + 4) - -^J^ > 0. 



Using Eq. (119), we get 



K' C{l-a] 



cK 



(120) 



By an argument analogous to the one in the proof of Lemma 5.19 this is in contradiction with 
VEn;((/)) = 0. The result fohows. □ 

5.2 Proof of Lemma 13.21 



By Lemma [5j[ (t)a{t) < (j)'^°'^{t - to), for a G Rq = • • • ,L -1 + p'^} and t > ti{Lo,5). 

Therefore, we only need to prove the claim for the modified state evolution. The idea of the proof is 
as follows. In the previous section, we analyzed the continuum state evolution and showed that at 
the fixed point, the function <j){x) is close to the constant (p* . Also, in Lemma 5.11, we proved that 
the modified state evolution is essentially approximated by the continuum state evolution as /? — )• 0. 
Combining these results implies the thesis. 



Proof (Part(a)). By monotonicity of continuum state evolution (cf. Corollary 5.12), lim(_j.oo (t^{x', t) = 
(p{x) exists. Further, by continuity of state evolution recursions, (j){x) is a fixed point. Finally, (p{x) 
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is a nondecreasing Lipschitz function (cf. Corollary 5.13). Using Lemma 5.19 in conjunction with 
the Dominated Convergence theorem, we have, for any e > 



lim 



£-1 



{x;t)-^*\dx< -, 



fl21) 



for a e (0,(To] and i > Iq. Therefore, there exists t2 > such that j J^^^ \(j){x;t2) — (j)*\dx < e/2. 
Moreover, for any t > 0, 



1 



- (/>*|dx = lini - V \(t>{pa;t) 
p— ^0 I' ^ — ' 



1 



L-p-i-l 



lim- \Hpa;t)-(^*\. (122) 



a=—p ^ 

By triangle inequality, for any t > 0, 
i 



a=—p 



L-p-l-1 



lim } V |</.a(t) - < lim \ V - 0(pa; t)| + lim \ V |0(pa; t) 



a=—p~ 



a=—p~ 



(123) 



1 



£-1 



(x; t) — (/)*|d3;. 



where the last step follows from Lemma 5.11 and Eq. (122). Since the sequence {(t>{t)} is monotone 
decreasing in t, we have 



L-p-i-l 



L-p-i-l 



a=—p~ 



lini lim - V 0a(f)<lim- V 0a(i2) 

p^O J.OO 1j ■'^ — ' p— !>0 ■'^ — ' 

L-p-i-1 

<lim- Q^^(t2)-<f>*\+4>*) 
P^O L 



a=—p 



(124) 



< 

e 

< - + 
-2 



(x; t2) — 0*|dx + > 



Finally, 



L+p-i-l 

lim V 4>a{t) < 
t-^oo — ' 



a=—p~^ 



2P~ ^ £ 
^^^^ + 2 + 



(125) 



< + - + 2ao. 



Clearly, by choosing large enough and (Tq sufficiently small, we can ensure that the right hand 
side of Eq. (125) is less than e. □ 



Proof (Part(h)). Consider the following two cases. 



32 



(T < (Tq: In this case, proceeding along the same Hnes as the proof of Part (a), and using 
Lemma 15.241 in Heu of Lemma 15.191 we have 



hm - V Mt) < (126) 

t— >oo J_j ^ — ' 

a=— p— 1 

for some constant Ci. 
• a > (Tq: Since (j)a{t) < for any t > 0, we have 

^ L-p-i-l 

hm - V Mt) < ^M. (127) 

t-^oo h ^ — ' 

a=— p-1 

Choosing C = max{Ci, ^m/o'o} proves the claim in both cases. □ 
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A Proof of Lemma 



5.11 



We prove the first claim, Eq. (82). The second one follows by a similar argument. The proof 
uses induction on t. It is a simple exercise to show that the induction basis {t = 1) holds (the 
calculation follows the same lines as the induction step). Assuming the claim for t, we write, for 
i E {0,1,...,L- 1} 



\Mt + i)-i^ipi;t + i)\ 



mmse 



— mmse 



< 



W{z-pi) k' + T/ W{z-y)^{y-t)dy]-\\z) 

mmse( ^ pW{p{b - i)) + ^ ^ P'^^P^^ " 

W{z-pi) W^ + j >V(z-y)^(y;t)dy]-Mz 
1 ^ JR ' 



(128) 



— mmse 



Now, we bound the two terms on the right hand side separately. Note that the arguments of mmse( • ) 
in the above terms are at most 2/cj^. Since mmse has a continuous derivative, there exists a constant 
C such that \dlds mmse(s)| < C, for s G [0,2/cr^]. Then, considering the first term in the upper 
bound ( |128[ ), we have 



feeRo 



<c\Y Wb-^ {W' + ]Y. Wb-jMt)]-' - + J E w,.Mpr,tr') 

C li ^"^ 



beRo 



C 



j=-oo 
L-1 



6a^ 

C 



beRo j=-oo 

L-l 

E ( E Wb-^W,.,) |V(pj;t) - ^,(t)| 



(129) 



j=0 beRo 



c 



L-l 



c'p 



^^(E^'J Ei^(/^j''*)-^^(*)i 



j=0 



i=o 
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Here we used Yliez^i ~ X^iez /'^^(P^)^ — ^Yl\i\<p-'^ — (where the first inequahty follows 
from the fact that W is bounded). 

To bound the second term in Eq. (128), note that 



feeRo 



mmse( J] - i)) + ^ E P^^P^'' " •^■))^('^-?' 

/ W{z - pi) + T / >V(z - y)'tP{y; t)dy]-'dz 
J — 1 i/ M 



mmse 



beRo 



£+1 
1 



W{z-pi) [cT^ + ] [ Wiz-y)ij{y;t)dy]^^dz 



<c\Y^ pW{p{b - i)) W' + ]Y1 p^(p(^ - ■?'))^('«^' 

feeRo jGZ 

- P^ipib - i)) + 7 / ^(pb - y)V'(y; t)dy]-' 
+ c| E pW{p{b - i)) + ] [ Mpb - t)dy]-^dz 



(130) 



beRo 



1 



W{z - pi) + ^ / W(z - y)V'(y; t)dy]-^dz 



<-^^Y.p'^^p^^-^Y.p^^^p^'^p^^- I ^iipb-^y)^y 

+ C| j;pF2(p6)- / F, 
beRo -^-1 



(z)dz 



where Fi{x;y) = W{x - y)ip{y;t) and F2{z) = W{z - pi) [ci^ + j /ir>V(2; - y)ij{y;t)dy] ^ Since 
the functions W( • ) and V'( ' ) have continuos (and thus bounded) derivative on compact interval 
[0,i], the same is true for Fi and i*2- Using the standard convergence of Riemann sums to Riemann 
integrals, right hand side of Eq. (130) can be bounded by C^p/da'^, for some constant C3. Let 
ei{t) = \^i{t) - ij{pi;t)\. Combining Eqs. ( [1291 ) and ([TSO]), we get 

L-l 



ei{t + l)<£^[C'J2^jit) + C3 

j=0 



fl31) 



Therefore, 



i=0 \ j=0 



(132) 



The claims follows from the induction hypothesis. 
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B Proof of Proposition |5.18 



By Eq. (86), for any e > 0, there exists (po, such that for < < </>o, 



< 



d{px)+e 1 
^ log('/' )■ 



Therefore, 



5(T^ 6 - d{px) - e 



logc/). 



(133) 



(134) 



Now let e = {6 - d{px))/2 and a2 = \/(/>o/2. Hence, for a G (0,(72], we get 4>* < 2(7^ < (/)o. 
Plugging in 0* for i;^ in the above equation, we get 



ba^ 6 - d{px) 



+ 



log (p* 



(135) 



C Proof of Claim 5.20 



Recall that k < and (j){x) is nondecreasing. Let 

0<^=?^<1. 



We show that 
obtain 



M - 2 

1) > k/2 + 0*. If this is not true, using the nondecreasing property of (j){x), we 



(x) — 0*1 dx 



|(/>(x) — (/>*| dx ■ 



-1 



ri-l 
/ l'A( 

Jet-i 



x) — 4>*\ dx 



< -0£ + $^,(l-^)£ 



(136) 



contradicting our assumption. Therefore, (/"(x) > k/2 + (p* , for — l<x<£ — 1. For given i^, 
choose Iq = K/{1 — 9). Hence, for £ > i^, interval [OH. — 1,^—1) has length at least K. The result 
follows. 



D Proof of Proposition 5.21 



We first establish some properties of function cr^ 



Remark D.l. The function a'^{x) as defined in Eq. (88), is non increasing in x. Also, cj^(x) = 
(T^ + mmse(Lo/(2cr^)), for x < — 1 and o"^(x) = cr^, for x > 1. For 6Lq > 3, we have 

(j2 < (x) < 20-2. 
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Remark D.2. The function (T^(x)/(T^ is Lipschitz continuous. More specifically, there exists a 
constant C, such that, |cr^(ai) — (T^(a2)| < Ca'^\a2 — ai\, for any two values 01,02- Further, if 
Lq6 > 3 we can take C < 1. 



The proof of Remarks D.l and D.2 are immediate from Eq. (88). 

To prove the proposition, we spht the integral over the intervals [—1, — 1 + a), [— 1 + a, xo + a), [xo + 
a, X2), [x2, ^ — 1), and bound each one separately. Firstly, note that 



'1 .^2 



X2 



(pa{x) (^(x) 



dx = 0, 



(137) 



since 4>a{x) and (j){x) are identical for x > X2- 

Secondly, let a = {x2 — xo)/{x2 — xq — a), and (3 = {ax2)/{x2 — xq — a). Then, 



J xo+a ^ 'Pa 



a^x)-a^ aHx)-a' 



dx 



X2 o-2f5±^ 



(x) (/)(x) 

)-a^dx p a2(x)-cj2 



XO 
^2 



4>{x) a J^^+a <p{x) 



dx 



/{■ 

Jxn ^ < 



\dx + 

Jxo 



(a) 1 /■^•2 

<T2 



< 



a' 
1 

^2 



2:2 



XO 



1 + /3\ 2/ N 

-cj ( ) - a (x 

a a 



dx+ 1 



,2.- + /3 



a 



)dx+-J 





a 


a) ^ 


1x0 


^2 


^x + (3, 





"T dx + 



dx 



xo+a ^2 



XO 



<y'Hx\ 



< ( 1 



a 
1 
a 



^2 2/a; + /3^ 



XO 



CJ 



a2(x) 



a 

K , 

dx + — 1 



K 



dx 



(138) 



dxH 1 +a 



a 



a 



+ a 



K + CK^ 1 



+ CKa+— 1 



+ a 



< C{K)a, 



where (o) follows from the fact a"^ < (/>(x) and Remark D.l, (6) follows from Remark D.2 
Thirdly, recall that 4>a{x) = 4>{x — a), for x G [—1 + a, xq + a). Therefore, 

rxo+a ,^2(^) _^2 ^2(^)_^2, 



rxo+a 
J -1+a ^ 0a 



(x) 



(/>(x) 



^0 a2(x + a) -g^ 

-1 '/'(a;) 
^0 a^{x + a)-a^{x 



dx 



j dx 

-1+a (f>{x) 



dx 



^0+'^ a^(x) - a 



dx 
2 



dx + 



XO 



L 0(2;) 



dx 



< + + 

< a, 



-1+a ^2 



dx 



-1 



(139) 



where the first inequality follows from Remark D.l and the second follows from (f){x) > a 
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Finally, using the facts < o"^(x) < 2a^, and cr^ < (j){x), we have 



(j){x) 



dx < a. 



Combining Eqs. (137), (138), (139), and (140) implies the desired result 



Ew,i(0a) + Ew,2(0a) + E>v,3 ((/)a) , where 



E Proof of Proposition |5.22 

Proof. Let Ew((/)a) 

y= r \KW*cp-\y))-K<p-\y-l))}dy, 

J X()+a 
co+a 

{\{W*c^-\y))-\{c^-\y-mdy, 
\\{W*(p-\y))-\{(l,-\y-mdy. 



Also let E|v(0) = E 



Ew,i(<; 

Ew,2(0a) 
Ew,3(</'a) 

w,i(</>) + Ew,2,3(0), where 



-W,l 



(</.) = /" * r '(y)) - KrHy - mdy, 

J XQ+a 



Ew,2,3(</') 



XQ 

xo+a 



{\{W*r'{y))-\{r'{y-mdy. 



(140) 



(141) 



(142) 



The following remark is used several times in the proof. 
Remark E.l. For any two values < ai < a2, 

• Bounding ^w,i{<t)a) - Ew,i (</>)• 

Notice that the functions 4>{x) = 4>a{x)-, for X2 < x. Also k/2 < 4>a{x) < </)(j;) < ^m, for xi < x < X2- 
Let a = {x2 — xi)/{x2 — xi — a), and f3 = {ax2)/{x2 — xi — a). Then, 4>a{x) = cj){ax — /3) for 
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X G [xo + a, X2). Hence, 

Ew,i(<Aa) - E>v,i('/') 



= / KW*<l>-\y))-\{W*r\y))dy+ / \{ct>-\y - 1)) - \{ct>-\y - 1)) dy 

J xo+a JxQ+a 

1 fX2 + l 1 

^ fX2 + l , fX2 (•X2 s 

<^ / ( / >V(y-^)<^-i(z)dz- / W{y-z)<t>-\z)dzyy 

^ J xn+a J xn+a—1 J xn+a—\ 



2 



' XQ-\-a J XQ-\-a—l J XQ-\-a- 

fX2+l , i'X2 rxo+a 



- 2 



XQ+a—l 



rX2+l , i'X2 rxo+a 

/ (/ W{y-z)(t)-\az-P)dz+ W{y - z)(j)-\z - a) dz 

J xo+a J xo+a J xo+a— 1 

rx2 X 
- / W{y - z)cl)-\z) dz)dy 

J Xo+a— 1 

rx2+l ... 1 _ _ _ z))cl>-\z) dz 

J xo+a Wa;o ^« « ^ 

+ / ° f>V(y - 2 - a) - >V(y - 2))<^"^(2) dz 

< ^ n { r {^(y - — ) - ^(y - 

+ /° f>V(y-2-a)->V(y-z))0-i(z)dz 

Ao-l ^ ^ 

/•xo+a-l 

+ / Wiy - z)(p-\z) dz\dy 

<Ci(l--) + C2- + C3a<C4a. (144) 
a a 

Here Ci, C2, C3, C4 are some constants that depend only on K and k. The last step follows from the 
facts that >V( • ) is a bounded Lipschitz function and (j)~^{z) < 2/k. for z G [xi,X2]. Also, note that 
in the first inequality, \{4>~^{y — 1)) — \{4>~^{y — 1)) < 0, since ^~^{y — 1) < ^a^iv — !)> and l( • ) is 
nondecreasing. 

• Bounding Evi;,2(^a) - ^w,2,i{4>)- 
We have 



Xo+a 

{\{w*ra\y))-Ua\y-imv 

^'Tx-ola-. (145) 



|•xo+a—^ 

+ / {\{W *ct>-\y))-\{ck-\y-mdy- 

J a 
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We treat each term separately. For the first term, 

(•xo+a 

{\{w*<p-Hy))-K<Pa\y-mdy 

xo+a— I 



/ { I / W{y- z)4>-\z) dz + W{y- z)^-\z) dz - \{4>-\y - 1)) \dy 

J Xo+a— 1 ^ \J XQ+a J Xo+a— 2 / 

xo+a / rxo+a 7 -I- R r\7 f^'o \ 

I / W{y - —^)ct>-\z) - + / W{y-a-z)r\z)dz]dy 

xo+a-l \Jxo a Jxo-2 J 

XQ 

\{r\y-my 

Xo-1 

fxo / rxo+a 7 -I- R Ay f^o \ 

/ I / w{y + a-—^)rHz)-+ W{y-z)rHz)dz)dy 

Jxo-l \Jxo " " Jxo-2 J 

fXQ 

- / \{r\y-my 

J xo-l 

(■XO + I 



< C5 a + 



fj^{\ (^"°^ W{y-z)<p-\z)dz^-\{^-\y-l))]dy 

C^a+ r |l(W*r'(y))-l(r'(y-l))|dy, (146) 
Jxo-l ^ 



where the last inequality is an application of remark E.l More specifically, 

/ fxo+a 7 -I- fl r\7 r^O 

I / W{y + a- —^)cP~\z) — + / Wiy- z)cp-\z) dz 

\Jxo " " Jxo-2 



(rxo+1 
/ W{y - z)(l)-\z) dz 
Jxo-2 



<J)»f / r^O+a 7 4- R r\7 + 1 

< ^ / W(y + a - -^)rH^) - - / W(y - z)cl>-\z) dz 
2 \Jxo « " Jxo 



2 Jxo+i " 



2 



Xo + l 



XO 



(w{y + a - - W(y - z)^ 4>-\z)dz 



< CUl - -) + Co- + < Cs a, 
a a 

where C[,C2,C'^,C5 are constants that depend only on k. Here, the penultimate inequality follows 
from a > 1, and the last one follows from the fact that W( • ) is a bounded Lipschitz function and 
that (j)^^{z) < 2/k, for z G [xi,X2]. 

To bound the second term on the right hand side of Eq. (146), notice that (paiz) = <p{z — a), for 
z G [—1 + a, xq + a), whereby 

pxo+a—l rXQ — l 

/ {l(W * -/.-^y)) - \{cp-\y - l))}dy = {l(W * ^Hy)) - Kr\y - l))}dy. (147) 

Ja Jo 
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Now, using Eqs. (142), (145) and (147), we obtain 

Ew,2(0a) - Ew,2,3(0) < Cs a - / {l(W * r\y)) - \{r\y - l))}dy 

Jxo 

<C5a+ log ' 



Xo 



(148) 



< C5 o + alog(— = Cq a 



where Cq is a constant that depends only on k. 
• Bounding Ew,3((/)a). 

Notice that (paiv) > cr"^- Therefore, \(W * (p~^{y)) < \{cr~^), since l( • ) is nondecreasing. Recall that 
(f'aiy) = (/)*< 2(T^, for y G [— 1, — 1 + a). Consequently, 



EwMa)< I {l(cT-^)-l(0*"')}dy< ^log(^) < ^log2 



(149) 



where the first inequality follows from Remark E.l 



Finally, we are in position to prove the proposition. Using Eqs. (|144|), (|148|) and (149), we get 

(150) 

□ 



Ew(0a) - Ew{(p) < C4 a + Ce a + - log 2 = C{k, K) a. 



F Proof of Claim [05 



Similar to the proof of Claim 
Co^ 12, where 



5.20 



the assumption 



"Idrr > Ca'^£ implies 



1)> 



Co 



'M 2 

Choose a small enough such that 0* < (pi. Let k = — — 0)/2. Applying Lemma 5.11 



there exists io, and do, such that, J^-^^ 
(l){fi£ -I) < 4>i, with 



dx < k£, for i > £q and a G (0, ctq]. We claim that 

V 1 + e 



Otherwise, by monotonicity of 4>{x 
(0i-,/.*)(l-^)£< 



-1 i-e-i 
\(j){x) - 4>*\ dx < / \(p{x) - 4>*\ dx < k£ 



Plugging in for fj, yield a contradiction. 

Therefore, < (t){x) < 4>i, for x £ [9£ - l,fi£ - 1], and {fi 

£ > max{4, 2K/{1 -9)} gives the result. 



(151) 

(1 - 9)£/2. Choosing 
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G Proof of Proposition |5.26 

To prove Eq. ( 116[ ), we write 



{Vroh{4'a{x)) - Vrob{(p{x))] 



-1 



< 



X2 r4>{x) 

/ V'{s) ds dx 

XI J 4>a{x) 

' ^ ' ^) dsdx 



2s2 



[s — a 



a 



a 



Xl J 4>a{x) 

5a 



dx 



where the second inequahty follows from the fact Ca'^/2 < (j){x), for x £ [xi,X2]- 



Next, we pass to prove Eq. (117) 
-1 



(a^(x)-a^) 



1 



1 



(j)a{x) (l){x] 



dx 



^2 ^2 



< 



Xl 

a 



a'{x) - a' 
4>(.x) 



1 - a 



1 - a 



X2 ^2 



-dx < K 



2a 



Xl 



C(l-a)' 



where the first inequality follows from Remark D.l 
Finally, we have 



Ew,rob(0a) - Ew,rob(<A) = / { I (W * .^"^ (y) ) - I (W * (y))}dy 



JW*(l>-^(y) 



mmse(s) ds dy 



< 



< 



JW*(j>-^(y) 

D{px) + e 



-(ir + 2)log(l-a) 



(152) 



(153) 



(154) 



where the first inequality follows from Eq. (113) and Claim 5.25 
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